class: center, middle, inverse, title-slide # Lecture 24 ## Generalized Linear Models: Introduction ### Psych 10 C ### University of California, Irvine ### 05/27/2022 --- ## Bounds - One of the problems with linear models is that they can predict values that are not always bounded. -- - For example, the interaction model that we used in the previous example predicted that elderly participants could correctly recognize 104 words out of only 100 when there was no time between study and test in a recognition memory task. -- - This problem is associated with the Normality assumption: $$ y_i \sim \text{Normal}(\mu, \sigma^2)$$ -- - This distribution has its support on all the real numbers, this means that the model is not restricted to the values that our dependent variable can take. -- - There is nothing in the model that states that the values of our dependent variable have to be between 0 and 100. The model assumes that it could be any value between `\(-\infty\)` to `\(\infty\)`. --- ## Solution - In order to solve this problem first we need to assume that our data follows a different distribution that is not a Normal. -- - The problem is which distribution should we use? -- - As with other problems that we have talked about before, there is no perfect answer to this question, however, we have to keep in mind that whichever distribution we choose, it will come with some assumptions about how our data behaves. -- - The key idea will be that, regardless of the distribution that we choose, we would like to keep using a linear function, so the challenge is: How do we get rid of the boundaries in our data? --- ## Binomial model: Logistic regression - One common model in the literature is the logistic regression model. -- - This distribution is used when we have one or more binary outcomes. -- - A binary outcome just means that there are only two possibilities. In other words something happens or it doesn't. -- - The common example would be a coin toss. Every time we toss a coin the outcome will either be "Heads" or "Tails" but it can't be both, and those are the only results we would expect. --- ## Example - One current example would be testing positive for covid. -- - If we get tested then the result of the test will be positive (patient is diagnosed with covid) or negative (patient is not diagnosed with covid). -- - If we don't know anything about the person who is getting a test then there will be some probability that the test is positive and some other probability that the test will return negative. -- - Let's assume that the test is perfect and if the person tests positive that means that the person is infected with covid and if it is negative then that means that the person does not have covid. -- - This is not how tests work but we can think like this for convenience for now. -- - Now imagine that we have two populations one group of people who are vaccinated and another group who are not. -- - As researchers we are interested in whether the vaccine works or not so we start taking samples of both groups. --- ## Example - We know that the vaccine does not offer 100% of protection, however, if it works then we should be able to tell the difference in the proportion of cases in both populations. -- - We test 30 participants of each population and get the following results: -- | Vaccination status | Sample size | Positive tests | |--------------------|:-----------:|:--------------:| | Not vaccinated | 30 | 7 | | Vaccinated | 30 | 4 | -- - We can see that the number of positive cases in each population is different, however, this is not enough to for us to say that the vaccines work. -- - First of all what does it mean for a vaccine to work? -- - Well, we can define it as reducing the probability of testing positive. --- ## Example - Now we have a question that we can formalize. -- - We want to know if being vaccinated changes the probability of testing positive for covid. Remember that we are using a perfect test in this experiment. -- - Now we have a similar problem as before, we want to compare the probability of testing positive for covid, but probabilities can only take values between 0 and 1. -- - So we need a way to get rid of those boundaries. -- - We will need some new notation. --- ## Bernoulli Distirbution - First we need a new model, in this case, we will consider that each observation (whether a participant tests positive or not) is independent from the rest. -- - Additionally, we will assume that each observation (positive or negative test) follows a Bernoulli distribution. -- - The Bernoulli distribution has one parameter known as `\(\theta\)` and it indicates the probability of a `\(1\)`. -- - In other words, this assumption means that our observations will take the value `\(0\)` (negative test) with probability `\(1-\theta\)` and the value `\(1\)` positive test with probability `\(\theta\)`. -- - Given that `\(\theta\)` is a probability we know that it has to take a value between `\(0\)` and `\(1\)`. --- ## Transformation of `\(\theta\)` - Now that we have our probability parameter `\(\theta\)` we can work on removing the bounds. -- - We call this process a "transformation", which means that we will use functions (rules) that take one value between `\(0\)` and `\(1\)` and take it to a value between `\(-\infty\)` to `\(\infty\)`. -- - First we will get rid of the upper bound, to do this we simply need to divide the probability of a positive test `\(\theta\)` by its complement `\(1-\theta\)` or the probability of a negative test: `$$\frac{\theta}{(1-\theta)}$$` -- - This new value is known as the "**odds**". --- ## Odds <img src="data:image/png;base64,#lec-25_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Logarithm of the odds - The odds can take any value between `\(0\)` and `\(\infty\)` so now the only thing left to do is to get rid of the lower bound. If we do that then we will be able to use a straight line as our prediction again. -- - In order to remove the lower bound from the odds we can use the natural logarithm. -- - This particular function will take any value grater than one and make it larger and at the same time will take values below one and make them negative, the value of 1 is transformed into a `\(0\)`. `$$log\left(\frac{\theta}{(1-\theta)}\right)$$` -- - This is known as the "**log-odds**" and is used in almost all models for binary data. --- ## Log odds <img src="data:image/png;base64,#lec-25_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## A model for the expected value - When we were working with the Normal distribution we used the linear function as a model for the expected value `\(\mu\)`. -- - In the case of the Bernoulli distribution we want to use a linear model for our transformation, the log-odds. -- - This is because the expected value of the Bernoulli distribution is actually its probability `\(\theta\)`. -- - The key part is that, given that we know all the steps we took to get from a value of `\(\theta\)` to a value of the log odds, we can now revert that process. -- - This means that we can have models that look like this: `$$log\left(\frac{\theta_i}{(1-\theta_i)}\right) = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2}+\cdots+\beta_kx_{ik}$$` -- - The with "some" algebra, we will be able to recover a value for the mean of the binomial distribution `\(\theta\)`, in other words, we will have a model for the expected value which we can interpret.